\section{Results}
To measure the differences in both understanding and enjoyment between both groups, this section will go through the data analysis process, present the results, and compare them to the hypotheses. The experiment was conducted with 15 participants in each group.

\subsection{Test of Understanding}
H01: \textbf{A} understands the rules at least as good as \textbf{B}.\\
HA1: \textbf{B} understands the rules better than \textbf{A}.

To combine the results from the three different methods of testing the player's understanding, the data was averaged and weighted. The average score for understanding the rules was weighted 35\%, the average score for the comprehension test was weighted 45\%, and the score for the player's description of the rules was weighted 20\%.

The description of the rules was weighted the lowest, since it is the least reliable method of the three. This is due to the fact that the question is very open, which can result in the player not answering adequately, as well as the fact that the facilitator is not asking any leading questions to make the player elaborate on his answer.

The score for understanding was weighted 35\%. As the facilitator can get more elaborative answers from this, it was deemed more reliable than the rules description score. However, as the score is evaluated by the facilitator, it is not completely reliable. It must be noted that only the understanding score set after the last puzzle was used, since this score represents the player's overall level of understanding when the game is over.

The comprehension test score received the highest weight, since it is the only method with closed questions with objectively correct and wrong answers.

With the different scores for understanding combined for each player, the results for both groups were compared using a t-test, as the data was normally distributed. With a p-value of 0.08, it can be concluded that there is a significant difference between the groups (see Figure \ref{fig:UnderstandingBoxplot}), where group A had the highest results in understanding. This means that the null hypothesis for understanding cannot be rejected.

\begin{figure}[htbp]
\centering
\includegraphics[width=0.50\textwidth]{Pictures/Design/UnderstandingBoxplot}
\caption{The median for A is higher than the median for B.}
\label{fig:UnderstandingBoxplot}
\end{figure} 

\subsection{Test of Enjoyment}
H02: \textbf{A} enjoys the game as much as \textbf{B}.\\
HA2: \textbf{A} enjoys the game more or less than \textbf{B}.

To measure the enjoyment for each group, an average of the score was calculated.
The enjoyment score gathered after each individual puzzle was combined with the overall enjoyment score from the questionnaire.

Both scores were weighted equally. The biggest difference between them is that the score after each puzzle is told to the facilitator, whereas the overall score is written by the player without the facilitator being able to see the answer. This can cause bias, as it might be harder for the player to be critical when talking to the facilitator. However, the overall score is potentially also biased due to the fact that the later parts of the game can influence the overall score more than earlier parts, as they are the easiest to recall.

Since the data was normally distributed, a t-test was used to compare the two scores for all players in both groups. A p-value of 0.8 was given, meaning that there is no significant difference between the two groups (see Figure \ref{fig:EnjoymentBoxplot}), which means that the null hypothesis cannot be rejected.

\begin{figure}[htbp]
\centering
\includegraphics[width=0.50\textwidth]{Pictures/Design/ExperienceBoxplot}
\caption{The median values are almost the same for A and B.}
\label{fig:EnjoymentBoxplot}
\end{figure}

\subsection{Test of Possible Factors}
Besides asking questions directly regarding the hypotheses, additional data is gathered in order to examine possible factors that could have influenced the results. 

In the questionnaire, the following question is asked: "How much more enjoyable/entertaining do you think the game would have been with/without a written explanation of the core concept?". This is measured on a scale from 1 to 10, where 1 is "Much less enjoyable/entertaining" and 10 is "Much more enjoyable/entertaining".

As seen in Figure \ref{fig:BetterWithOrWithoutRule}, both groups generally agree that the game would be more enjoyable without being given the rules in the beginning. This result fits well with the statement written in the introduction about complaints made by experienced gamers regarding extensive hand-holding in games.

\begin{figure}[htbp]
\centering
\includegraphics[width=0.50\textwidth]{Pictures/Design/BetterWithOrWithoutRule}
\caption{Red = Group A. Blue = Group B. The graph shows that both groups in general believe that the game would be more enjoyable without knowing the rules. Values above five indicate a higher interest in not being shown the rules. Values under five indicate the opposite.}
\label{fig:BetterWithOrWithoutRule}
\end{figure}

In the data logging of amount of deaths for each puzzle, it can be seen that the two groups are similar to each other (see Figure \ref{fig:NumberOfDeaths}). However, in puzzle 2 and 4, participants without the rules have a tendency to die more. Likewise, the completion times for puzzle 2 and 4 have a tendency to be longer for this group as well (see Figure \ref{fig:PuzzleCompletionTime}). This might indicate that these two puzzles are particular hard to solve and/or have a lot of tricky areas where players can fall through.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.75\textwidth]{Pictures/Design/NumberOfDeaths}
\caption{Filled = Group A. Shaded = Group B. The amount of deaths is very similar, however in puzzles 2 and 4, group B dies the most.}
\label{fig:NumberOfDeaths}
\end{figure}

\begin{figure}[htbp]
\centering
\includegraphics[width=0.75\textwidth]{Pictures/Design/PuzzleCompletionTime}
\caption{Red = Group A. Blue = Group B. The completion time is very similar, however again in puzzles 2 and 4, group B performs worse.}
\label{fig:PuzzleCompletionTime}
\end{figure}

Through further testing it was revealed that enjoyment in puzzle 2 and 5 (for this purpose, referred to as "bad puzzles") are proportionally inverse to the rest (for this purpose, referred to as "good puzzles") in regards to completion time. The average completion time for each puzzle was 126 seconds. Players using longer than 126 seconds in the good puzzles increases their enjoyment of that particular puzzle, while using longer than 126 seconds in the bad puzzles decrease the enjoyment of that puzzle. With this information, it can be assumed that the bad puzzles are less enjoyable, meaning that players just want to get them over with. This might have something to do with the fact that both puzzles contain dexterity challenges and the players simply fail these.